12. Order of Operations in Text Processing

Order of Operations in Text Processing

Question:

Start Quiz:

Solution:

INSTRUCTOR NOTE:

Here's an example that might help if this is all a little abstract:

Suppose that the text in question is "responsibility is responsive to responsible people" (ok, this doesn't make sense as a sentence, but you know what I mean…)

If you put into bag of words straightaway, you get something like

[is:1

people: 1

responsibility: 1

responsive: 1

responsible:1]





and then applying stemming gives you

[is:1

people:1

respon:1

respon:1

respon:1]

(if you can even find a way to stem the count vectorizer object in sklearn, the most likely outcome of trying would just be that your code would crash…)





Then you would need another post-processing step to get to the following bag of words, which is what you'd get straightaway if you stemmed first:

[is:1

people:1

respon:3]



Obviously the second is probably the one you want, so stemming first gets you the right answer here.